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VECTOR FITTING FOR MATRIX-VALUED RATIONAL APPROXIMATION 

Z. DRMAC*, S. GUGERCIN+, AND C. BEATTIE+ 


Abstract. Vector Fitting (VF) is a popular method of constructing rational approximants that provides a least 
squares fit to frequency response measurements. In an earlier work, we provided an analysis of VF for scalar-valued 
rational functions and established a connection with optimal I-L 2 approximation. We build on this work and extend 
the previous framework to include the construction of effective rational approximations to matrix-valued functions, 
a problem which presents significant challenges that do not appear in the scalar case. Transfer functions associated 
with multi-input/multi-output (MIMO) dynamical systems typify the class of functions that we consider here. Others 
have also considered extensions of VF to matrix-valued functions and related numerical implementations are readily 
available. However to our knowledge, a detailed analysis of numerical issues that arise does not yet exist. We offer 
such an analysis including critical implementation details here. 

One important issue that arises for VF on matrix-valued functions that has remained largely unaddressed is the 
control of the McMillan degree of the resulting rational approximant; the McMillan degree can grow very high in the 
case of large input/output dimensions. We introduce two new mechanisms for controlling the McMillan degree of the 
final approximant, one based on alternating least-squares minimization and one based on ancillary system-theoretic 
reduction methods. Motivated in part by our earlier work on the scalar VF problem as well as by recent innovations 
for computing optimal I-L 2 approximation, we establish a connection with optimal I-L 2 approximation, and are able to 
improve significantly the fidelity of VF through numerical quadrature, with virtually no increase in cost or complexity. 
We provide several numerical examples to support the theoretical discussion and proposed algorithms. 
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1. Introduction. Rational functions provide significant advantages over other classes of ap¬ 
proximating functions, such as polynomials or trigonometric functions, that are important in the 
approximation of functions that occur in engineering and scientific applications. Matrix-valued ra¬ 
tional functions offer substantial additional flexibility and broaden the domain of applicability by 
providing the potential for the interpolation and approximation of parameterized families of multi¬ 
dimensional observations. For example, in a variety of engineering applications, the dynamics arising 
from multi-input/multi-output (MIMO) dynamical systems may be inaccessible to direct modeling, 
yet input-output relationships often may be observed as a function of frequency, yielding an enor¬ 
mous amount of data. In such cases, one may wish to deduce an empirical dynamical system model 
nominally represented as a matrix-valued rational function, that fits the measured frequency response 
data. This derived model may then be used as a surrogate in order to predict system behavior or to 
determine suitable control strategies. See mmm for examples of rational approximation in action. 

The McMillan degree of a matrix-valued rational function, H(s), is the sum of pole multiplicities 
over the (extended) complex plane, or equivalently, the dimension of the state space in a minimal 
realization of H(s). It is convenient to think of H(s) as a transfer function matrix associated 
with a stable MIMO linear time-invariant system having m inputs and p outputs (although this 
interpretation is not necessary for what follows); McMillan degree is a useful proxy for the level of 
complexity associated with H(s). Let r > 0 be an integer denoting the desired McMillan degree 
for our approximant: H r (s) = N(s)/d(s), where N(s) is a p x m matrix having elements that are 
polynomials in s of order r — 1 or less, and d(s) is a (scalar) polynomial function having exact order 
r. Denote by lZ r the set of matrix-valued functions of this form. Note that lZ r consists of matrix¬ 
valued functions having entries that are strictly proper rational functions of order r; the McMillan 
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degree of H r G lZ r could range as high as r • min(m,p), which could be significantly larger than our 
target, r. We assume that observations (evaluations) of H(s) are available at predetermined points 
in the complex plane, s = £ 1 , ..., Only observed values at these points: H(£j), for j = 1, 
will be necessary in order to derive our approximations. We proceed by seeking a solution to 

t 

min 5>||H r (&)-H(&)|£ ■ (1.1) 

t /V- /• 

i— 1 

Sanathanan and Koerner EZ, proposed an approach to solving ED that produces a sequence 
of rational matrix approximants H^^s) having the form Hr^(s) = N^^(s)/d^ fe ^(s), where N^^(s) 
isapxm matrix of polynomials of degree r — 1 or less and (s) is a (scalar-valued) polynomial 
of degree r. For each k , the coefficients of and dM' 1 are adjusted by solving a weighted linear 
least squares problem with a weight determined by . An important reformulation of the 

Sanathanan-Koerner (SK) iteration was introduced by Gustavsen and Semiyen [55] . which became 
known under the name Vector Fitting (VF). The term “vector fitting” is appropriate in light of the 
interpretation of the Frobenius norm, || • ||f, that appears in ED as a standard Euclidean vector 
norm, || • || 2 , of an ‘unraveled’ matrix listed column-wise as a vector in C mp . The Gustavsen-Semlyen 
VF method produces a similar sequence of rational approximants (s) = N ( k \s)/d( k \s) but now 
with and d^ defined as rational functions represented in barycentric form. Making central 
use of a clever change of representation at each step, the Gustavsen-Semlyen VF method achieves 
greater numerically stability and efficiency than the original Sanathanan-Koerner iteration. 

We describe both the SK and VF iteration in (E] and make some observations that contribute 
to our analysis of it in (El In particular, we note that the change of representation implicit in the 
VF iteration can be related to a change of representation from barycentric to pole-residue form. 
This change of representation is made explicit in 92.21 where we provide formulas that appear to be 
new. These formulas can be useful at any step of either the SK or VF iteration in order to estimate 
the contribution that each pole makes to the current approximation. This, in turn, is useful in 
determining whether, r, the initial estimate of McMillan degree, is unnecessarily large relative to 
the information contained in the observed data. We note that many authors have applied, modified, 
and analyzed VF, see e.g. EE EH], EH], m , EE EU, EH]. A MATLAB implementation vecfit3 
is provided at El] and is widely used. When applied to MIMO problems, high fidelity rational 
approximations are sought and generally achieved at the expense of a relatively high McMillan 
degree for the final approximant. By way of contrast, the approaches we develop here are capable 
of providing systematic estimates to the McMillan degree of the original function, H(s), and can 
produce high fidelity rational approximations of any desired McMillan degree, to the extent possible. 

In (El we analyze VF within a numerical linear algebra framework and discuss several important 
issues that are essential for a numerically sound implementation of the method. Although VF is 
based on successive solution of least squares (LS) problems, which is a well understood procedure, 
subtleties enter in the VF context. We review commonly used numerical LS solution procedures 
in EH and point out some details that become significant in the VF setting. In 93.21 we argue 
and illustrate by way of example that, as iterations proceed, the VF coefficient matrices in the 
pole identification phase tend to become noisy with a significant drop in column norms that can 
also coincide with a reduction in numerical rank. This prompts us to advise caution when rescaling 
columns in order to improve the condition number, since rescaling columns that have been computed 
through massive cancellations will preclude inferring an accurate numerical rank. 

Ill-conditioning is intrinsic to rational approximation and manifested through the potentially 
high condition numbers of Cauchy matrices that naturally arise. We introduce in 93.4.21 another 
approach toward curbing ill-conditioning. Using recent results on high accuracy matrix computa¬ 
tions, we show that successful computation is possible independent of the condition number of the 
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underlying Cauchy matrix, and we open possibilities for application of Tichonov regularization and 
the Morozov discrepancy principle, which provide additional useful tools when working with noisy 
data. In M3.51 we offer some suggestions for efficient software implementation and provide some 
algorithmic details that reduce computational cost. For the sake of brevity, we omit discussion of 
the algorithmic details that allow computation to proceed using only real arithmetic. 

Typically, the weights in ED are Pi = 1 and the sample points, are either uniformly or 
logarithmically spaced according to sampling expedience. We do consider this case in detail, but also 
propose, in §4, an alternate strategy for choosing pi and that is guided by numerical quadrature. 
This connects the “vector fitting” process with optimal H .2 approximation and follows up on our 
recent work, [2B]. Indeed, with proper choice of the nodes £, and the weights pi, we can interpret the 
weighted LS approximation of (11.11) as rational approximation in a discretized I-L 2 norm in the Hardy 
space of matrix functions that are analytic in the open right half-plane. This leads to a significant 
improvement in the performance of the VF approximation and also sets the stage for a new post¬ 
processing stage proposed in (}5l where we address the question of computing an approximation 
having predetermined McMillan degree. 

Rational data fitting has a long history (going back at least to Kalman 22) and continues 
to be studied in a variety of settings: For example, Gonnet, Pachon, and Trefethen m recently 
provided a robust approach to rational approximation through linearized LS problems on the unit 
disk. Hokanson in ||tjj presented a detailed study of exponential fitting, which is closely related 
to rational approximation. The Loewner framework developed by Mayo and Antoulas [D3 31] 
is an effective and numerically efficient method to construct rational interpolants directly from 
measurements. This approach also has been extended successfully to parametric EMU and weakly 
nonlinear problems. In this paper, we focus solely on matrix-valued rational least-squares 

approximation by VF. 

2. Background and Problem Setting. VF iteration is built upon the Sanathanan-Koerner 
(SK) procedure [52] . which we describe briefly as follows: A sequence of approximations having the 
form Hr fc) = N W (s)/(s) € TZ r , is developed by taking d(°\s) = 1, and then solving successively 
for N( fe+1 ) and d( fc+1 ) via the weighted linear least squares problem: 


Ak) - 


= £ 


tr M (fe) (&) I s 




min. for k = 0,1, 2, 


( 2 . 1 ) 


Since and d ^ have a presumed affine dependence on parameters, ED does indeed define a 
linear least squares problem with respect to those parameters. Notice also that once N^ fc+1 ^(s) and 
d^ k+1 ) (s) have been computed, only (s) is used in the next iteration as a new weighting func¬ 

tion. Although convergence of this process remains an open question, if the iterations do converge 
at least in the sense that, at some fc„, d( fc « +1 )(s) “is close to” rf( fc *)(s) at the sample points the 
process is halted and we take H r (s) = N* fc * +1 ^(s)/d^ fc * +1 - ) (s). Note also that before this last step 
(i.e., for k < A;*), any effort to compute the value of N^' +1 ^ is wasted. 

The error in (12.11) may be rewritten as = Yfi=i Pi |^ fe * +1 HCi)/^ fe *H^) f ||H r (£,) - H(£j)||^ , 
which corresponds to ED up to an error that depends on the deviation of |d( fe * +1 ) /d^”^ | from 1. This 
deviation becomes small as convergence occurs and can be associated with stopping criteria that we 
introduce in ( 13.4.31 If the iteration is halted prematurely, then the last step leaves d^ unchanged and 
redefines N( fe+1 ) as the solution to the LS problem £i=i P;||N( fe+1 )(&)/d (fc) (&) - H(&)||J. min. 
The implementation of this procedure depends on specific choices for the parameterization of (s) 
and d^ k \s) used in representing H.^(s); barycentric representations will offer clear advantage. 

2.1. VF=SK+barycentric representation. Suppose that at the fcth iteration step, we choose 
a set of r mutually distinct (but otherwise arbitrary) nodes, {£ '*} j=i • Consider a barycentric rep- 
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resentation for H^(s): 


H l k \s) = 


N W( S )_ f) 


v(fc)', 


, #eC pxm , tp) K> ,X) K> e C. 


m ,_(&) \(fe) 


( 2 . 2 ) 


d {k) (s) 1 + E^i^/^-Af)’ 

Observe that if t/ife ^ 0, Hfe(A^) = On the other hand, if c p^ = 0 then H^(s) has 

a simple pole at A.fe with an associated residue given explicitly in (12.91) of Proposition 12.II 


A new approximant is sought having the form, 

^(fc+i) 

H( fc+1 )(s) = —^ 




i+E;.i^‘ +1) /7-Af)’ 


$J W> £ € pxm , £ €. 


The weighted least squares error as described in en) is written explicitly as 


Ak) _ 


= £ 




(fc+i) 


$ 

E j 

7=i s* A ,- 


(fc) 


-H&) l + £ 


<A 


(fc+i) 


3 =i SJ A J 


(2.3) 


One of the distinguishing features of the Gustavsen-Semiyen VF method ESI emerges at this point: 
the explicit weighting factors l/|<i^^(^i)| 2 are eliminated through “pole relocation”, i.e., the nodes in 
the barycentric representation are changed in such a way so as to absorb the weighting factors. Note 

r r 

that if {Af +1) } are the zeros of d*- fe ^(s), then we can write S k \s) 

9=1 9=1 

and then 


t 


9=1 


9=1 


n^- A 9 fc) ) $ 


(fc+i) 


§ nj-ite-Af 1 ) H(a 


nte-Af>)+E^ +,> n({.-Af') 

9=1 3 =1 95 i 3 


nu^-A^) 


= £ 

2=1 


(fe+1) 

3 


r n«.-A<*>)$ 

9Al_ 

h n;=ite-A? +i) ) 


- H(fc) 


rife - A f)+£^ +i) rife- A f } ) 

9=1 1 = 1 9 A? 


ni =1 fe-Af +1) ) 


= e (fe). 


(2.4) 


The next iterate, Hfe 1 ^, will be represented in barycentric form using nodes Aq fc+1 \ q = 1,..., r. 
We assume simple zeros for simplicity. We introduce new variables, so that 


n( S -AW)+£^ +i) ri(«-Af ) ) 

9=1 _J = 1_ qAj _ _ 'Pj 


(fc+1) 


In the same way, we introduce new unknowns, so that 


(fc+i) ■ 




(fc+l) 


^ n:=i( s -Af +1) ) 


r $ 

= £ 1 


(fe+i) 


3=1 S ~ \ 


(fc+1) • 


(2.5) 































After this change of variables, the LS objective function from (12.311 appears as 


c (*0 - 


= £< 


$ 


(/c+i) 


t _\(k+i) 

7=1 si A ? 


- H(fc) 1 + 


A 


(fc+1) 


' t \(^+l) 

7 = 1 Si A 7 


( 2 . 6 ) 


Now, the values for {$j fc+1 -*}t =1 and that minimize in (12.61) will determine the 

next iterate H) fc+1 ^(s). The iteration continues by defining d^ k+1 \s) = l + )Cj=i <^. fc+1 )/(s — A^ +1 ^) 
and new poles A^ fc+2 ^ will be the computed zeros of rf( fe+1 )(s). 

Numerical convergence is declared and the iteration terminates at an index fc* when maxj | Lp l ' k " 1 \ 
is “small enough”. The zeros of d( k *\s) are extracted as the eigenvalues of the matrix diag(Aj fc *^)t =1 + 
(l,-- - , l) T ((p[ k, \■ ■ ■ ,<pi k *' > ) and assigned to {Aj fc * +1 ^}J =1 . The denominator d^ fc *^(s) is set to the 
constant 1 indicating numerical pole-zero cancellation, and the LS problem (12.61) is solved for 
(assigning k + 1 = fc*) with all terms replaced with zeros. The resulting VF approximant is 


Hr(«) = £ 

J=1 


$ 


(M 


s - Af * +1) ' 


(2.7) 


It is usually reported that numerical convergence takes place within a few iterations, provided that 
initial poles are well chosen. However in difficult cases, numerical convergence may not be achieved, 
and the iterate described in (12.21) may have a denominator that is far from constant. Even if the 
stopping criterion is not satisfied, the above procedure that yields EH) will be valid at any k , and 
may be interpreted as taking the last computed barycentric nodes, viewing them as poles, and then 
computing a best least-squares rational approximation in pole-residue form. We discuss this closing 
step of the iteration further at the end of d 2.3 1 The stopping criterion will be discussed in 33.4.31 

2.2. Pole-residue form of barycentric approximant. The barycentric representation of¬ 
fers many advantages for both the VF iteration and the SK iteration (i.e., with the explicit weighting 
byl/|d (fe) (&)l 2 as in El)- Nonetheless, the pole-residue representation is more convenient for anal¬ 
ysis and further usage. For example, a pole-residue representation of a (rational) transfer function 
leads immediately to a state space realization of the underlying dynamical system. Furthermore, we 
may wish to inspect, in the course of iterations, an approximant (12.21) with the goal of estimating 
the importance of the contribution of certain of its poles and the possible effect of discarding them. 
Hence, it would be useful in general to have an efficient procedure to transform a barycentric repre¬ 
sentation of a function into a pole-residue representation. Toward that end, consider apxm matrix 
rational function 

y' r j£i_ 

G (s)= , ^eC7 xm , e C, and |^-| + ||#,|| F > 0. (2.8) 

1 + 1 a-\j 

Assume that the barycentric nodes, Ai,..., X r are distinct and closed under conjugation (non-real 
values appear in complex conjugate pairs), and that the corresponding ipjS and 3>.,s have a compatible 
conjugation symmetry (ipj , 4?., real if A j real, pi = Tpj, <!>, = tfc, if Aj = A j). Define complementary 
index sets: 


Jo = {j ■ Pj = 0} and J x = {j : ipj ± 0}. 
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Then {A^ : j £ Jk}, {<fij '■ j £ Jk} are closed under complex conjugation for k = 0,1. Note that 
{Aj : j £ Jo} are among the poles of G(s). The remaining poles can be indexed as {A., : j £ J \}. 

Proposition 2.1. Assume, in addition to the above, that all poles ofG(s) in ESP are simple 0 
Then, 


gw = E 


Ri 


S — An 


E R ‘ 


jeJi 3 jeJo 

where the residues can be efficiently calculated as 


s — Aj 


Rj — 




1 + Sieji A ») 


jGj 0 ; R, = (>' ^ V ,3-, j G Jr. 

II.w,(A, - f^A.,- A, 


(2.9) 


Proof. The proof immediately follows from a calculation of residues, e.g. for j G Jo, Rj = 
lim s _ s . Ai (s - Aj)G(s). □ 

The first formula in (12.911 (for j G Jo) is a special case of the second one, which is actually given 
in ( 1231 ) . This reflects the fact that the change of variables (12.511 in the transition from SK to VF 
iterations implicitly seeks a pole-residue representation. 

2.3. Computing 1 11 and ipj k+1 K Measurements are naturally kept in a tensor § G C pxmxe 
with S(:,:,*) = = H(£j) + £^ G C pxm . S^) denotes a sampling of the transfer function at 

the node allowing also for measurement errors to be represented through £^\ Notice that the 

mode-3 fiber E>(u,v ,:) = (^Suv ,Suv , ..., SuJ 1 ^, Suv^j 6 C*, represents connections between input 
v and output u over all frequency samples. 

The expression for e can be matricized in several natural ways. The one followed in the VF 
literature is derived from a point-wise matching of input-output relations over all measurements. 
This is natural as it decouples the problem into p ■ m rational function approximations having a 
common set of poles. In terms of the matrix entries, the least squares error (12.611 can be written as 


p m 


^ (fe) = E^EE 

i =1 u—1 v=l 

m p 


E 


('3>( fe+1 h <A fc+1 ) 

> UV _ q(i) Tj 
(fc+1) °uv .(fc+1) 


- 5 (i) 


= 1 \€i ~ ^ 


fi - K 


= E E Vp (^ (fc+1) ’ ~D^W k+ V) - V p S(u, v, :) 

V=1 14=1 \ r J 

where V p = diag(0E) , D (w) = diag(^)f =1 , = (^ fe+1) ,..., # +1) ) T , and 

( (t>[ k + 1) )uv\ 


<^(fc+1) = 



c \( fc + !) 

€l A 2 

«i-4 fc+1) 

1 

1 

1 

S2 -^ + 1 > 

t 2 -A 2 k+1> 

( 2 -4 fc+1) 

i 

i 

i 

Z . (fc+i) 

Q-i -A i 
l 

Z A (fc+1) 

Q-1~ A 2 

1 

i 



(,-4 fc+1) 


<? ( k+1 \u,v ,:) = 


(* ( 2 k+1) )uv 


($( fc _+ 1 1) ) u „ 

V ($(. fc+i) )^ ) 


( 2 . 10 ) 


( 2 . 11 ) 


lr The general case of multiple poles is just more technical and it follows by standard residue calculus. 
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To minimize (12.101) . it is convenient to introduce the QR factorizations for 1 < u < p, 1 < v < m: 


V p ( < T( fc + 1 ),-£)(™)^( fe + 1 )) 


/ 

Vi 



Qt +1) I o 

0 0 


12 


\*Uiv ) 22 


=((Q (fc+1) )i ( Q { u k v +1) h (Q& +1) h) 


' * * * X X X \ 

0 * * X X X 

0 0 * x x x 

0 0 0 * ★ ★ 

0 0 0 0 ★ ★ 

0 0 0 0 0 ★ 

0 0 0 0 0 0 


Vo 


0 0 0 0 0 0 


6 / 


( 2 . 12 ) 


(2.13) 


where the unitary matrix has been partitioned as = ^(Q( fc + 1 ))i (q!V +1 ^) 2 (Q1u + 1 ^) 3 ), 
with block columns of sizes lxr,lxr,fx(<- 2r), respectively. The leading r columns of (12.121) are 
independent of (u,v) hence the initial part of the factorization, (Q^ fc+1 ^)i (i?^ fc+1 ^)ii = 'D p c tf( k+1 \ 
need only be done once. 

The LS residual norm may be decomposed as = e[ k> + e^ 1 + where 


4‘ ) = EE ( fl(H1, )i i* ik+1 H u ,v 

v=l u= 1 

,:) + (Ri k + 1] ) I2<p (fc+1) - (Q^lVpSiu, v ,:) 2 

m p 

e ( 2 ] = E E (^V +1) )22^ (fc+1) - (Q& +1) )2©pS(«,t;,:) 

H = 1 U— 1 

2 

, and 

2 

m p 

l’=EE (t +1) )3 V p S(u,v,:) 

v=l u= 1 

2 

2 



(2.14) 


where we have used M* to denote the conjugate transpose of a matrix M. Here, is a part of the 

f/c+l) f/c+1) 

residual that is beyond the reach of the unknowns $) ', p - - it corresponds to the component 

in the data that is orthogonal to the subspace of rational functions available with the current 
barycentric nodes. Only a set of new (better) barycentric nodes will incline this subspace toward 
the data in such a way as to reduce this component of the error. Extracting optimal information from 
a given subspace associated with the current barycentric nodes is achieved by minimizing +e^\ 
To that end, first note that we can assume that (i?^ fe+1 ' ) )ii is nonsingular since it participates 
in a QR factorization of V p ^ k+1 \ which in turn must have full column rank since {A^ A: ' ) }y =1 are 
presumed distinct and we (tacitly) assume that ^ X'l' 1 throughout the iteration. Thus, we can 

make exactly zero for any choice of 1 p( k+1 '> by taking, concurrently for all input-output pairs 
(u, v), with 1 < u < p and 1 < v < m, 

& k+1) (u,v,:) = (i? (fe+1) )n ((g (fc+1) )lP P §(u,u,:) - {Rt +1) )i 2 ^P {k+1) ) ■ (2.15) 

When the poles of the approximant are assigned to the barycentric nodes, {Aj , +I ' l }! ( ’_ 1 , then the 
residues in mm may be found by solving (12.151) with = 0. For other cases, the optimal 

ip ( k+ - 1 ) will be found by minimizing e^\ and this process is uncoupled from any information about 
^(fc+i) However, determining does involve a coupled minimization across all input-to-output 

(u,v) pairs. The computation (12.151) is skipped if we choose to proceed with the SK iterations. 

Once the maximal number of iterations is reached without reducing cp ■ fc - ) enough to be ne¬ 
glected, then in (12.151) also cannot be neglected without losing information. That is why the 
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approximant defined in (EH) uses the poles 1 . It follows from 92.21 that this is equivalent to 

solving (12.1511 with included in the right-hand sides and transforming the computed rational 

approximant from barycentric into pole-residue form. This is another elegant feature built into 
the Gustavsen-Semlyen VF framework. (If we use the original SK iterations with diagonal scalings 
and fixed poles, then the barycentric form can be transformed to pole-residue representation using 
Proposition 12.11 ) 


2.4. Global structure. This procedure can be represented as ||A^ fc+1 ^x — b|| 2 — > min, in the 
usual form, as follows. First, we specify that the indices (u, v ) in the p x m array will be vectorized 
in a column-by-column fashion, (u, v) l uv = p(v— 1) +u. Define A( fc+1 ) to be a block matrix, with 
pm x (pm + 1) block structure, each block of dimensions i x r. Only 2 pm out of pm(pm + 1) blocks 
are a priori nonzero. Initially, set A^ fc+1 ^ to zero and update it as follows: in the block row l uv set 
the diagonal block to V p and the last block in the row to —D p D < ' uv ' )e tf( k+1 ' 1 . The right-hand 
side is the vector b with the pm x 1 block structure, where the i uv th block is set to T> p S(u, v ,:). The 
concurrent QR factorizations ( 12 . 121 ) can be then represented by pre-multiplying A^ fe+1 - ) with unitary 
block-diagonal matrix (Q( fc + 1 ))*, with the t uv th diagonal block set to (Q^~ 1 ' > )*- It is easily seen 
that the rows of (Ql fc + 1 ))*A (fe+1 ^ can be permuted to obtain the structure illustrated in (12.161) f] 
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(2.16) 


Of course, the above matrices will not be used as an actual data structure in a computational routine. 

(k) 

But, this global view of the LS problem is useful for conceptual considerations. For instance, the 
part of the residual corresponds to the zero rows of (Q( fe+ 1 ))*A^ fc+1 ^ - they build the block of zero 

rows at the bottom of the row-permuted II(Q^ fc+ 1 ^)*A^ fc+1 \ see (12.161) . The corresponding entries 

(k) 

in the transformed right-hand side amount to £3 ' in the Euclidean norm. 

The LS problem with the block upper triangular permuted = n(Q( fc+ 1 ^)*A( fc+1 * and the 

corresponding partitioned right-hand side s( fc+1 ) = II(<2( fc+1 ))*b can be written as, see (12.161) . 


/p>(k+i) 

c [n] 
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M k+1) \ 

U k+1) J 


(2.17) 


2 Elements not displayed are zeros. 













































Algorithm 1 Vector Fitting - Basic Iterations 
1: Given: The sampling data H(G) for i = 1,.. ; maximal number of iterations fc m ax- 

2 : Set k t— 0 and make an initial pole selection A ( - fc+1 - ) £ C r . 

3: while { stopping criterion not satisfied and k < fc max } do 
4: Form A( fc+1 ) and b. 

5: Compute = n^P^ 1 ))* A< fc+1 ) and s < k+1) = n(Q( fc+1 ))*b and partition as in (1TT71) . 

6: Solve ||Bf 2 fc + 1 V (fe+1) - s^ +1) || 2 —t min for <p {k+1) ■ 

7: Set k t— k + 1 and compute A (fc+1 ^ = zeros( 1 + Y^j=i /( s — A^)). 

8: end while 

9: * = (B^,)"^. 


Remark 2.1. Observe that the iteration on ip^ proceeds independently of <i> and indeed, 
$ is obtained only in the final step, Line 9, which accomplishes the simultaneous determination of 
residues by minimizing \\V p (“if ( fc+1 )^ fc+1 ) (u, v, :) — S(u, v, :)) || 2 , for u = 1,... ,p, v = 1,..., m. This 
observation was first exploited in |22] . This may be implemented as a solution of an LS problem 
with multiple right-hand sides, and additional measures can be taken to compute more accurate 
residues, see M3. 4. 21 Stopping criteria (Line 3) will be discussed in M3. 4. 31 

3. Numerical issues that arise in standard VF. The key variables in VF are computed as 
solutions of LS problems, where the coefficient matrices are built from Cauchy and diagonally scaled 
Cauchy matrices, thus potentially highly ill-conditioned. Further, as we hope to capture the data by 
reducing the residual, we also expect cancellation to take place. These issues pose tough challenges 
to numerical analyst during the finite precision implementation of the algorithm. In this section, we 
discuss several important details that are at the core of a robust implementation of VF. 

3.1. Least squares solution and rank revealing QR factorization. To fully understand 
the global behavior of VF iterations in finite precision arithmetic, it is crucial to investigate all the 
details of an LS solver used in a robust software implementation. For example, consider Line 6 . in 
Algorithm [IJ i.e., consider the LS problem ||B[ 22 ]<p — s|| 2 —> min where we now drop all superfluous 
indices to ease notation. In a MATLAB implementation, the solution is obtained using the backslash 
operator, i.e., <p = B[ 22 ]\s, or using the pseudoinverse, i.e., ip = pinv(B[ 22 ])s, computed using the 
SVD and an appropriate threshold for determining numerical rank. The state of the art LAPACK 
library pT] provides driver routines xgelsy, based on a complete orthogonal decomposition, and xgelss, 
xgelsd, based on the SVD decomposition. 

We briefly describe the decomposition approach: In the first step, the column pivoted QR 
factorization is computed and written in partitioned form 

B [22] P = WT=(W 1 W 2 )(^W W *W = l, \\T [22] \\ F <e\\T [n] \\ F , (3.1) 

where e is a threshold value, e.g. e = ns. As a consequence of the Businger-Golub pivoting m, 

k 

^r|T,fc| 2 , 1 <i<k<r. (3.2) 

j=i 

In the case of differently weighted rows of B[ 22 j , numerical stability can be enhanced by using Powell- 
Reid complete pivoting [50] or by presorting the rows in order of decreasing oo-norm [15]. The actual 
size of Tjn] may be determined by an incremental condition number estimator, or by inspecting for 
gaps in the sequence |Tn| > |T 22 | > • • • > |Tnn|- If no such partition is possible, then T = Tr-m and 
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the block Tj 22 ] is void. In ill-conditioned cases, as we could have in Algorithm [Q such a partition is 
likely to be visible, see, e.g., Figure [3TTI Then, Tj 2 2 ] is deemed negligible noise and set to zero. This 
is justified by backward error analysis: there exists a small perturbation of the initial matrix B[ 22 j 
such that this part of the triangular factor is exactly zero. Then, an additional orthogonal reduction 
transformation Z (similar to QR factorization, see, e.g., LAPACK routine xtzrzf) is deployed from 
the right leading to a URV decomposition 

B [22] P » W ^ Z, and the solution p> = PZ* ■ (3-3) 

The vector ip in (13.31) is the minimal Euclidean norm solution of a nearby (backward perturbed) 
problem. However, in the rank deficient case, other particular choices from the solution manifold 
might be of interest. For instance, once we set Tj 22 ] in (13.11) to zero, we can use 

B [22] P » W ^g 11 T g 21 ^ , and the solution <p (0) = P > (3-4) 

with the same residual norm as ip in (13.31) . Note that while the MATLAB command B[ 22 ]\s computes 
tp(°\ the SVD based pinv(B[ 22 ])s and the LAPACK routines return the minimal norm solution <p. 
These details may also significantly impact the computation in Line 9 of Algorithm Q] (cf. Remark 
12.11) . Numerical rank deficiency will trigger truncation and in the case of the solution method in (13.dD . 
some of the residues in m will be computed as p x m zero matrices, thus effectively removing the 
corresponding poles from H r (s). We discuss this residue computation step in more detail in 33.4.21 
Determining the numerical rank is a delicate procedure and it should be tailored to a particular 
application, based on all available information and interpretation of the solution. For instance, what 
is a sensible choice for the threshold e in (13.11) and how do we decide whether to prefer the solution 
of minimal Euclidean length or the solution with most zero entries? What can we infer from the 
numerical rank of B[ 22 j? These issues are further discussed in 33.21 

3.2. Convergence introduces noise. In this section, we analyze and illustrate that as VF 
proceeds, the coefficient matrices in the pole identification phase tend to become noisy with a 
significant drop in column norms that may coincide also with a reduction in numerical rank. This 
prompts us to advise caution when rescaling columns in order to improve the condition number, since 
rescaling columns that have been computed through massive cancellations will preclude inferring an 
accurate numerical rank. Since VF simultaneously fits all input-output pairs using a common set of 
poles, it suffices to focus our analysis on only one fixed input-output pair (u,u). For simplicity of 
the notation, we drop the iteration index k , and take unit weights, i.e., T> p = le . If r is large enough 
and the poles have settled, then, with some small error , 

S (u, v, 1 : t) « ‘fix + error = ( Q)i(R)nX + error , 

where x = <P(u,v, 1 : r), see Remark |2.II Here we used the QR factorization (12.121) . Now, the right 
hand side in the error contribution e 2 in (12.141) that corresponds to the pair ( u , v ) is 

(Quv)lH u ’ v > 1 : R) = {.Quv)* 2 {Q)i{R)iix + (< 3 m,) 2 error = (Q uv )lerror. (3.5) 

The vectors of the structure (13.51) are building the vector s 2 in (12.171) . Furthermore, using (12.121) . 

(Ruv )22 = (Quv) 2 [ diag(S(u,u, 1 : t))^ = (Q uv )\ { [(ffx + error ) (l ... l)]o^}, 

exe 


to 















Fig. 3.1. The structure of the matrix B[ 2 2 ] an d column pivoted triangular factor in line 6. during the first 
two iterations in AlgorithmIn the first plot, showing the data in the first iteration, the column norms o/B[ 2 2 ] are 
marked with (red) ■— and the column norms its triangular factor in the Businger-Golub pivoted QR factorization are 
marked with (blue) o —. The second plot shows the same information, but in the second iteration. 


where o denotes the Hadamard product. Hence, a jth column of (R uv )22 reads 

( /Vtei-Aj) 

( Ruv)22{',j ) = (Quv) 2 < [(Q)i(R)i\X + error } o I ; 

l Vi/^-AH 

which means that (R U v) 22 ('-,j) could also be small, depending on the position of A j relative to the 
£jS. The LS coefficient matrix B[ 22 j in line 6. of Algorithm Q] is assembled from the matrices (Ruv)22, 
and we can expect that it will have many small entries that are (when computed in floating point 
arithmetic) mostly contaminated by the roundoff noise. We illustrate this on an example. 

Example 3.1. We use the one-dimensional heat diffusion equation model [T3], obtained by 
spatial discretization of J^T(x, t) = a-^T(x, t)+u(x, t), 0 < x < 1, t > 0 with the zero boundary 
and initial conditions. The discretized system is of order n = 197. We generate i = 1000 samples 
and set r = 80. The structures of B[ 2 2 ] and its column pivoted triangular factor are given in Figure 
13.11 The column norms of B[ 22 ] in the first step are so particularly ordered due to the ordering of 
the initial poles A j = a, ± iidj , where the /3jS are logarithmically spaced between the minimal and 
the maximal sampling frequency and ay = —/ 3 ,. Note the sharp drop in the column norms in the 
second iteration, after the relocated poles induced better approximation. 

3.3. The Quandary of Column Scaling. In the VF literature, it is often recommended to 
scale the columns of the LS coefficient matrix in line 6 of Algorithm [T] to make them all of the same 
Euclidean length, before deploying the backslash solver, and then to rescale the solution, see, e.g., 
[Ml . One desirable effect of this column equilibration step is to reduce the effective condition number 
of LS coefficient matrix (see e.g., 031 [55]). While this can be beneficial, nonetheless this tactic also 
may have a variety of deleterious effects and, in our opinion, must be considered with caution. Of 
foremost concern, following the discussion from 93.21 is that scaling noisy matrix columns effectively 
increases the influence of noise, and allows these noisy columns to participate in the column pivoting 
process of the QR factorization, with a possibility that some of them become drafted and taken 
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Fig. 3.2. ( Examvle \3.1\ cont.) The structure of the pivoted triangular factors (cf. \ 3.21) j o/B[ 2 2 ] the first two 
iterations in Algorithm ^ The plot shows the values \Ta\ of the unsealed B[ 22 ] (marked with (red) ■—) and of the 
column equilibrated B[ 22 j (marked with (blue) x —). 


upfront as important. This, in turn, interferes with the rank revealing process and possibly precludes 
truncation based on a partition as in JO- For illuminating discussions related to this issue we refer 
to [3D] ■ 123 1 ESI • In the following two examples we illustrate the potentially baleful effects of column 
scaling in the particular context of VF iteration. 

Example 3.2. We continue using the heat model from Example 13.11 In Figure I3TD1 we show, 
for the first two iterations, the moduli of the diagonal entries (17^1) of the triangular factors of B[ 2 2 ] 
(appearing in line 6 of Algorithm [T] without and with column equilibration. Note that, due to the 
diagonal dominance (EH), the distribution of 1 7’,;,| is decisive for numerical rank revealing. 

We now illustrate how the numerical rank deficiency may be manifested during the VF iterations. 
We solve the LS problems for the tp^s using only a simple modification of MATLAB’s backslash 
operator: first reorder the equations so that the rows of the coefficient matrix have decreasing (.oo 
norms, and then apply backslash^ This stabilizes the LS solution process in much the same way as 
does Powell-Reid complete pivoting (see [15] 02)- We use 1000 frequencies and r = 80. The samples 
are matched perfectly with both our implementation of VF and vectfit3 [53] (up to relative errors 
of the order of 10 -13 ). The first plot in Figure 13751 shows the structure of the denominators < p ^ 
throughout ten iterations. Each tp^ is represented by the sorted vector of log 10 (|^( fc )|/||(^( fe )||i), 
and the zero entries of ip^ are not shown. For k = 1,..., 10, the values of sum(|y)( fc )|/||<p( fc )|| 00 > e) 
are, respectively, 53, 43, 38, 30, 44, 44, 44, 40, 42, 44. (If we restart the approximation with r = 53, 
the number of nonzero coefficients throughout the iterations are 40, 36, 38, 41, 41, 42, 41, 41, 39, 41.) 
To interpret these numbers, we compute the Hankel singular values ay > • • ■ > <7197 and superimpose 
them on the graph as log 10 (cq/eri) - those values marked by o. Since the ays are forming a “devil’s 
staircase” and there is no clear cutoff index. For instance, (J^/ai ~ 1.28e — 11, 036 /ay ~ 2.17e— 13, 
< 744/171 ss 1.12e — 16, < 753/171 ~ 9.00e — 17. (If we use the backslash without the initial row pivoting, 
the numbers of nonzero coefficients throughout the iterations are 53, 44, 37, 42, 50, 50, 50, 49, 49, 
50.) 


3 Recall the discussion in EH 
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Fig. 3.3. fExample \‘3.2{ ) History of the first 10 iterations, k = 1,..., 10. Plots show logio(\<Pj k ^\//\\(p( k ^\\i) vs 
j; only nonzero coefficients are shown. Normalized Hankel singular values are represented as diamonds. 


Recall from Proposition 12.II that the barycentric nodes corresponding to = 0 are the poles 

(k) 

of the current approximant Hi- ' and that the corresponding residues are accessible by an explicit 
formula m , thus allowing an estimate of the contribution of the pole and perhaps discarding it 
and reducing r. The matching of the number of the untruncated entries in the LS solutions with 
the number of significant Hankel singular values is striking and may offer machinery to readjust the 
order of the approximant, r, during the iterations. This matching cannot be guaranteed in general 
but offers a great promise. Full understanding of the potential of the VF for determining the order of 
the underlying system (e.g., as in the Loewner framework [46] ) remains an important open problem. 
The right-hand side plot in Figure FP1 also shows log 10 (|<^ fc )|/||<^ fc )||i), but with the column scaling 
of the least squares coefficient matrix and rescaling the solution, as in vectfit3. Note that, once the 
scaling is applied, connection to the Hankel singular values decay is lost; supporting our discussion 
on the undesirable effects of column scaling in VF. 

3.4. mimoVF: Putting the pieces together. In this section, based on our preceding analysis 
of VF for matrix-valued rational approximation problem, we start testing our new implementation 
of VF. We will call the new implementation and the corresponding MATLAB toolbox mimoVF. This 
section will provide examples for verification and validation of mimoVF. We will compare our im¬ 
plementation with the original vectfit3 |54pl and show that our proposed modifications based on the 
theoretical analysis can substantially improve the results. While on average vectfit3 performs well, 
in the ill-conditioned cases, it has difficulties with numerical issues addressed in this paper. 

For the resulting rational approximation H r , define the tensor § r (:, :,i ) = H r (£j), i = 1,... ,£, 
and the relative LS error as 7 = ||§ — S r ||F/||§||F- Recall that i) = H(£j) e C pxm , i = 
1 ,... ,£, contains the original samples that are either measurements, or computed from a state space 
realization of the underlying LTI dynamical system. 

3.4.1. A stress test. We consider a model for the ISS 1R module [13. with m = p = 3. 
The underlying dynamical system has dimension n = 270. This example presents a difficult test 
case with rather vivid dynamics. The model is very hard to approximate and presents significant 
challenges to model reduction; see [3j2]. To that end, we choose r = 50 and take l = 300 samples. 


4 The options used deploy the relaxed vector fitting technique 1351 . 


13 


















Fig. 3.4. (Example of 43.1. 1 1 ) Comparison of mimoVF and the vectfit3 on the ISS 1R module with initial 
poles set as the eigenvalues of a pseudo-random stable real matrix, i = 300 and r = 50. (The frequency response 
magnitudes of each of the possible nine input/output pairings is plotted in solid blue; the corresponding frequency 
response magnitudes from rational approximations provided by vectfit3 and mimoVF appear as dashed red lines.) 
The purpose of the experiment is to check the robustness of the numerical implementation in the case of unpropitious 
distribution of the barycentric nodes. The first plot shows the output of vectfit3 (with the relative error 7 > 10 ), and 
the second of mimoVF (7 < 1 O' 2 ), both after two iterations. 


The initial barycentric nodes are chosen as the eigenvalues of a pseudo-random real stable r x r 
matrix, a potentially poor initialization. Using these initial nodes, two iteration steps are taken 
both in vecfit3 and mimoVF. The goal of this example is to illustrate that once the computations in 
each iteration step are made more robust (following our preceding analysis), high-fidelity rational 
approximants can still be achieved even with a small number iteration count or even in the cases 
of poorly initial choices of barycentric nodes. The results of vectfit3 and mimoVF are shown on 
Figure [331 where we depict the amplitude frequency response plots for the data and for the rational 
approximants. Note that this model has m = 3 inputs and p = 3 outputs, there are 9 input/output 
channels corresponding to the different lines in Figure FOl The figure clearly illustrates the mimoVF 
performs significantly better than vecfit3 for this example. Recall that both functions are given the 
same set of initial poles. Despite this unfavorable choice of initial poles, a restricted number of 
iterations, and ill-conditioned LS matrices, mimoVF succeeds to compute a model with relative LS 
error below 7 ps 6.45 • 10~ 3 . On the other hand, the relative error due to vectfit3 is 7 « 10.41; a 
significantly higher value than the error due to mimoVF. It is possible that allowing vectfit3 to iterate 
further might realign the poles better, leading to a smaller LS error and an accurate approximate. 
But, of course, this comes with additional costs since every step of the iteration requires to solve a 
potentially large-scale LS problem; especially when m and p are large. Therefore, any reduction in 
the iteration count is a gain in terms of computational efficiency. 

3.4.2. How to compute the residues in the final step. One of the advantages of the 
barycentric implementation of the SK iterations over the original approach using polynomial repre¬ 
sentations is in the avoidance of high powers of (which may cause overflow and underflow in finite 
precision arithmetic) as producing ill-conditioned Vandermonde matrices. The additional scalings 
by l/|G^ fc -*(£i)| 2 ! which is another potential source of ill-conditioning, has been elegantly removed by 
the VF formulation and compensated by reallocating the barycentric nodes A) . However, once the 
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VF iterations are completed, one needs to solve for the final residues $ in Line 9. of Algorithm [T] 
for the converged poles. This step needs to be performed carefully as the coefficient matrix that 
determines the residues for given set of poles is a Cauchy matrix, which, together with Vander¬ 
monde matrices, is among the most notoriously ill-conditioned matrices. To illustrate, the spectral 
condition number of an arbitrary 100 x 100 real Vandermomde matrix is larger than 3 ■ 10, and 
the condition number of the 100 x 100 Hilbert (Cauchy) matrix is more than 10 150 . The column 
norms of the latter are between 0.07 and 1.3, thus no column scaling can substantially reduce the 
condition number. Furthermore, the additional weightings pi (whose values may spread many orders 
of magnitude) may further worsen the conditioning of the least squares coefficient matrix. This all 
is a menace to the final computed residues, in particular when the order r is sufficiently high and in 
cases of unfavorably distributed nodes. This issue has to be addressed if the method is to be applied 
to truly challenging problems with complex dynamics and of high orders, for instance, for m,p,r 
in hundreds. In this section we focus our attention to the very last step - given poles of a rational 
approximant, how to best numerically extract the residues. 

Example 3.3. We continue to use the ISS 1R example from H3.4.1I However, in this case, 
we choose good initial poles of the form A j = ay ± ify, where the (positive) /3jS are log-spaced 
over the frequency sample interval and aj = —/3j as often recommended in the VF literature for 
good initial pole selection. We take £ = 500 and choose r = 100. Recall that the underlying 
system has dimension n = 270 with p = m = 3; therefore with r = 100 and £ = 500 samples, one 
expects to obtain a very good approximation. The amplitude frequency response plots are depicted 
in Figure 1531 for both vecfit3 and mimoVF, once again illustrating that mimoVF outperforms vecfit3; 
the relative LS errors due to mimoVF and vecfit3 are respectively, 9.47 • 10 _1 and 4.90 • 10~ 3 . In 
this case, vecfit3 suffers from the numerical ill-conditioning of the final residue computation. Thus, 
this example shows that even a plenty of good initial barycentric nodes, one does not necessarily 
guarantee a good approximation due to the numerical issues arising in the residue computation step. 


A regularization approach for residue extraction. Even though mimoVF performed rela¬ 
tively well in Example 13.31 some of the less-dominant input/output pairs were not captured as accu¬ 
rately as one would prefer; see Figure 1531 In this section, we consider a regularization technique to 
further improve the final residue extraction step in mimoVF. Recall Line 9. of Algorithm[T]to compute 
the final residues: solving $ = B(Esi. As noted in Remark l2.1l this corresponds to the simultaneous 

determination of the residue matrices by solving \\V p (ff( k+1 ')(p( k + 1 ') (u, v, :) — § (u,v, :)) || 2 —> min, 
u = 1,... ,p, v = 1,..., m. To simplify the notation, denote this LS problem by WVpffx — h \\2 —> 
min, where & = is a Cauchy matrix as in (12.111) . h is the corresponding scaled right-hand side, 
A is closed under conjugation and and the solution vector should also be closed under conjugation. 
Such a constrained problem can be replaced by an equivalent unconstrained LS problem 


(V&,x\ 


x — 



W&x-hWz 


min 


(3.6) 


with the coefficient matrix again of the diagonally scaled Cauchy structure, 'if = ( V p © 'D p ') c €^ ^ x . 

The SVD of can be computed to high relative accuracy based on the pivoted LU decomposition 
ni^na = LDU , where each entry (including the tiniest ones) of the computed factors L, D , U is 
computed to high relative accuracy, and L, U are well conditioned. The ill-conditioning of jif is 
revealed in the diagonal matrix D. This decomposition can be used immediately in an LS solver 
H2, or it can be used to compute an accurate SVD ng, he Ezina which is then used to compute 
an approximate LS solution. can be severely ill-conditioned so that small changes of £,s and A^s 
can cause significant perturbation of the SVD. However, we can consider the values of £,s and A^s, 
as stored in the machine memory, as exact and attempting to compute accurate SVD is justified. 
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Fig. 3.5. {Example l3.3l ) Comparison of mimoVF and the vectfit3 on an ISS example with initial poles set as 
log-spaced (imaginary parts log-spaced over the frequency range). The first plot shows the output of vectfit3 (with the 
relative error 7 « 9.47 • 10 and the second of mimoVF (7 Pi 4.90 • 10 — after one iteration. In this example, 
I — 500 and r = 100. The second row shows relative approximation error — H r (^) ul ,|/|H(^) ul ,| for the four 

dominant input-output pairs over all samples. Since the order of the underlying system is n = 270 and p = m = 3, 
r = 100 should provide good approximation. 


Let ^ = WSV* be the SVD and let the unique^ LS solution be x = VT^W* = Yli=i v i{w*h)/a % . 
Unfortunately, an accurate SVD is not enough to have the LS solution computed to high relative 
accuracy, and additional regularization techniques must be deployed. This is in particular important 
if the right-hand side is contaminated by noise. In the Tichonov regularization, we choose ji > 0 


5 Since all nodes are distinct and the poles are assumed simple, the matrix is of full column rank. 
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Fig. 3.6. Comparison of mimoVF and the vectfit3 on an ISS example with initial poles set as log-spaced (imagi¬ 
nary parts log-spaced over the frequency range). In this example, t = 500 and r = 100. The first plot shows the output 
of vectfit3 after two iterations (with the relative error 7 « 1.77 • 10 — 1 ), and the second of mimoVF (7 ~ 4.90 • 10 — 
after one iteration. 


and use the solution of \\% J x 


h\\\ + ^ 2 ||x||| “^ m i n , explicitly computable as 


E 

2=1 


af + p 


;{w*h)l 


(3.7) 


The parameter /i can be further adjusted using the Morozov discrepancy principle |3H|, he., to achieve 
\\ioXfj, — h \\2 ~ v, where v is the estimated level of noise 8h in the right-hand side, v « || 2 • 

Example 3.4. Here we continue Example 13. 31 use the same data, and apply vecfit3 and mimoVF 
where we use the ^preceding regularization approach in the final residue computation step. Using 
accurate SVD of if we compute x^ as in (EH with an ad hoc choice of /.t = 10 3 . The results after 
only one iteration of mimoVF and two iterations of vectfit3 are shown on Figure ETBl mimoVF still 
has smaller LS error; 4.90 • 10 -3 compared to 1.77 • 1CU 1 due to vectfit3. It is important to note that 
mimoVF achieves this better performance after only one iteration. vectfit3 has the error of 1.77 • 10 _1 
after two iterations. This shows that a more robust implementation may reduce the total number 
of iterations needed to reach satisfactory approximation; thus reducing the overall computational 
complexity. 

Remark 3.1. The LDU decomposition, which is the initial step of the accurate SVD, can also 
be used in the pole identification phase to compute accurate QR factorization of Cauchy matrices 
at all stages of the computation. We have not included those modifications, for the sake of brevity 
of the presentation. Interested readers can find the details of such an approach in Ell- 

3.4.3. Stopping criterion. It should be noted that the stopping criterion in the VF framework 
is rather vaguely specified. To the best of our knowledge, the VF literature does not provide a precise 
and numerically justified strategy of halting the iterations. For instance, vectfit3 allows only running 
for a given fixed number of iterations. 
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Following our recent analysis [26] . we propose to declare the |<^ fc+1 ^|s “small enough” if 


E 




U l^e(Af +1) )| 


= 0 (fc+1) < e, where e is a suitable threshold. 


(3.8) 


This seems appropriate because max s6 a |d( fe+ 1 )(s) — 1| < 6^ k+l \ and in m we can write 


s (i) (i+E 


(fc+i) 

Vj 


3=1 & ^ 


(fe+1) 


) = S (i) + ASW, ||AS«|| f < 6»( fe+1 )||S w ||i,, 


thus interpreting this as introducing backward perturbation into the data. In fact, this interpretation 
can be a guidance for choosing the threshold value e by following a discrepancy principle, i.e., so 
that this backward error matches the estimated size of the noise level on the input. 

3.5. Guidelines for efficient implementation. We now further discuss implementation de¬ 
tails that are relevant for an efficient software implementation of mimoVF or Algorithm [l] in general. 
Recall that VF for MIMO systems with m input and p inputs will require p ■ m QR factorizations 
(12.1211 of size £ x (2r) before advancing toward finding ipj k+1 \ It is clear that this is a demanding 
computational challenge even for moderate m and p\ e.g., in the case m = p = 100 , it will require 
p-m = 10000 QR factorizations of size £ x (2r). Since these factorizations are independent, they can 
be very efficiently parallelized and the whole computation can be optimized for a multi-core com¬ 
puting machinery. This has been nicely described by Chinea and Grivet-Talocia M, who showed a 
nearly ideal speedup on a four quad-core architecture. 

3.5.1. Efficient computation of B[ 2 2 ]. It was pointed out earlier that the QR factoriza¬ 
tions in (12.121) are independent of (it, v) in the first r columns and (_R( fe + 1 ))n from (12.151) can 
be computed by a single QR factorization, optionally with column pivoting, of 'D p ( rf^ k+1 \ i.e., 
Vp^ k+1 ' , H = ( T(fc 0 +1> ). Further, the introduction of rank-revealing column pivoting II 

in this factorization incurs a negligible overhead, while preserving the structure (12.161) . We dis¬ 
cussed in ED that this pivoting is very important for numerical robustness of the LS solution as 
well. In an LAPACK-style implementation, the matrix can be computed and stored in form 

of r Householder vectors (using Xgeqp3), and then, using Xormqr, (y( fc+1 ))* can be concurrently 
applied to all ~V p D^ uv ">^ k+1 \ u = 1 ,,p, v = 1 ,,m. Then, it only remains to compute the 
QR factorizations of the (r + 1 : £, r + 1 : 2 r) submatrices of — (V^ k+1 ^)*V p D^ uv ^^ k+1 ' > . One should 
note that in a blocked QR factorization a similar computation is done anyway in the process of 
computing the QR factorizations (12.121) . The computed triangular r x r factors are the (2,2) blocks 
in (12.121) that build the matrix b!^. Hence, the saving of this modified approach is equivalent to 
the cost of ( pm— 1) QR factorizations of size fxr, or, approximately, const - ( pm— 1 )£r 2 . The total 
work on the QR factorizations (12.121) without this modification is 4 • const ■ pm£r 2 . 

(k) 

Further, when we are solving only for the 99 ) ; during the VF iterations, the elements of Q*A , 
denoted by x in ( 12 . 121 ) and (12.161) are not used in the pole identification phase, but they are computed 
as (-R^ + 1 ))i 2 parts of the QR factorizations (12.121) . On the other hand, once the poles are fixed, 
the LS problem is solved with the approximant of the form ( 12 . 21 ) . but with the unit denominator, 
d( k \s) = 1, or, equivalently, with = 0, j = 1,..., r. This means that, for computing (12.71) by 

the Algorithm[lJ we do not compute the matrices (R™ + 1 ' l )i 2 , which further reduces the complexity. 
Our implementation of this more efficient approach is based on adapting the LA PACK’S functions 
Xormqr, Xlarft and Xlarfb. 
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3.5.2. Locally pivoted factorization. In the presence of noise and ill-conditioning, pivoting 
is essential when using the QR factorization. Hence, we propose to include pivoting in the procedure 
outlined in 1)3.5.11 More precisely, the QR factorization of 'D p ^ k+1 ' ) is computed with column 
pivoting, i.e., 2? p < ^ fe + 1 )n( fc+1 ) = ( T< g +1) )- This enhances the accuracy of the computed 

residues in line 9. of Algorithm [T] In a software implementation, Line 9. is reshaped into the LS 
problem with the coefficient matrix T^ k+l ' ) and with m-p right-hand sides, for all input-output pairs. 
Optionally, one can use the truncation discussed in 33.11 or the accurate SVD as explained in 33.4.21 

Further, we also advocate the use of pivoting when computing the QR factorizations of the 
(r + 1 : l, r + 1 : 2r) submatrices of — {y ( ' k+1 ' > )*'D p D^ uv ' ,( ^’^ k+1 \ This increases the accuracy of the 
computed matrix Bj 22 j in line 6. For details how pivoting influences the accuracy of the rows of 
the computed triangular QR factor we refer to [55]. Furthermore, this may allow (in the cases of 
numerical rank deficiency, as revealed by the pivoted QR factorization and discussed in 33.11) to set 
certain numbers of rows of 22 to zero and thus increased the number of zero rows in b| 22 ^ 1 ' ) 

and reduced the complexity of Line 6 in Algorithm [lj where, as described in m the LS solver 
starts with the QR factorization with column pivoting. 

The pivoted QR factorization of the tall and skinny matrix Bj- 22 , ; can be computed by e.g., first 
computing the QR without pivoting using the techniques of [5] , and then computing the pivoted QR 
factorization of the computed r x r triangular factor, or e.g. as in HZ]. These approaches become 
particularly attractive if p ■ m and r are large. 

Remark 3.2. Interestingly, we do not need to compute the 22 S. Instead, we can build 

the matrix Br 22 j " 1 ' 1 from the (r + 1 : £, r + 1 : 2r) submatrices of — (y^ k+1 ^)*V p D^ uv ^^ k+1 ' ) . This 

/'Lit') 

will lead to increased number of rows in Bj 22 j , but overall it reduces the complexity with potential 
gain increased if the QR factorization of B P7 | 11 is computed using the strategies of 0 . E- 

4. Numerical Quadrature in mimoVF for discretized W 2 approximation. The frame¬ 
work for mimoVF is based on the algebraic least squares (LS) error minimization (11.11) where one 
usually chooses the weights pj = 1 and the nodes are usually selected heuristically. In [5S1 . 
for single-input/single-output (SISO) systems, we have shown that with the underlying dynamical 
system in mind, reformulating the discrete LS problem as discretization of an underlying continuos 
H 2 error measure and then choosing the nodes and weights by an appropriate numerical quadrature 
improves the performance of VF significantly. The same conclusion holds in the MIMO case as 
well since, once the common set of poles has been determined, mimoVF works separately on each 
input-to-output pair. We illustrate these considerations briefly in this section. 

4.1. 'H/j approximation and numerical quadrature. The algebraic least squares error is 
closely related to the V .2 system norm. More precisely, consider the space 2 +"' of p x m matrix 
functions M(s), analytic in the open right half-plane C+ = {s € C : S(s) > 0}, such that 
sup x> o ||M(oi + ny)\\ 2 F dy < 00 . The space +"is a Hilbert space with the associated inner 
product and norm defined by 

(M 1 ,M 2 ) W2 = Trace (M 1 (iu)M 2 (su) T j dw, ||M ||„ 2 = (^ j l|M(«w)|£ du 

(4.1) 

The R 2 approximation problem is, then, to find a degree-r rational approximant, H r (s), that min¬ 
imizes the Ti .2 error norm ||H — H r ||% 2 over all degree-r rational function H r (s). Such an optimal 
rational approximant must satisfy certain Hermite tangential interpolation conditions; for details 
we refer to [51 [33]. The Iterative Rational Krylov Algorithm (IRKA) of Gugercin et al. [33] is a 
numerically effective iterative algorithm that constructs degree-r rational approximatants satisfying 
the T^-optimality conditions. 
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Our goal in this section is to repeat the success of m for SISO systems, i.e., improve the 
performance of mimoVF by formulating the discrete LS measure as discretization of the continuous 
%2 error. Towards this goal, approximate the 7^2 error with a quadrature role to obtain 



H r (®a>)||f.dca 


i 

^>, 2 ||Hfe) - H r (fc)||S- + p\ M+[ |H - H r | 2 ] + pi M_[|H - H r | 2 ](4.2) 

3=1 


where M± [G] are linear functionals of G that capture information about asymptotic behavior of G 
at ±oo. Note that the usual VF formulation correspond to p+ = = 0, with all other pj = 1, and 

choosing sampling nodes to be equidistant and in complex conjugate pairs. Thus, the usual VF 
objective function is as a composite trapezoid quadrature rule for the integral in (Id.21) approximating 
the H 2 error. As we discussed in ESI, more effective quadrature options may be considered, e.g., 
Gauss-Legendre, Gauss-Kronrod, and Gauss-Hermite quadrature rules. We do not go into the details 
of what quadrature method to choose here; since our main point is just to illustrate that mimoVF 
can perform much better once formulated as a discretized 7^2 measure. 

4.2. A numerical example. Here, with a simple example, we illustrate the effect of choosing 
the sampling points and the weights using numerical quadrature. We use the Clenshaw-Curtis type 
quadrature rule developed by Boyd m- We use the ISS 1R module (32] with m = 3 inputs and 
p = 3 outputs. We use only 100 function evaluations (£ = 200) and apply mimoVF to this data set 
for different r values. Resulting relative 7^2 errors are shown in Table H~T1 below. The quadrature- 
based selection yields the smallest error in each case; for r = 20 and r = 30, it leads to one order of 
magnitude improvements. 


Method 

r = 10 

r = 20 

r = 30 

r = 40 

mimoVF without Quadrature 

3.4104 x lCT 1 

3.1775 x 10" 1 

1.2778 x 10" 1 

5.0257 x 10~ z 

mimoVF with Quadrature 

2.5209 x 10" 1 

4.6074 x 10" 2 

3.3226 x 10~' 2 

2.1436 x 10 -2 


Table 4.1 

Effect of quadrature nodes and weights on mimoVF 


5. Controlling the McMillan Degree. Let C <C pxm and {Ay}J C C denote the final 

set of matrix residues and poles, respectively, resulting from mimoVF. To ease notational clutter, we 
drop the iteration index k. The associated rational matrix approximant can be represented as 


H r(s)=J2 

7=1 


* 

s- Xj 


(ip ■ ■■ flp) 

V 


l p 

s — \ r 



(5.1) 


If H r (s) has simple poles, then H r (s) has nominal McMillan degree deg(H,,(s)) = Xq=i rankly). 
Evidently, r < deg(H r (s)) < rmin(p, m). Note that deg(H r (s)) will be strictly larger than r unless 
the residues, 4?^, all have rank 1, and indeed, deg(H r (s)) can be potentially much larger than r 
if either the input space or output space has significant dimension. McMillan degree is a proxy 
for the complexity involved in evaluating an approximant, therefore it is generally desirable and 
sometimes necessary to reduce the McMillan degree of a rational approximant to the target value 
of r, while retaining its approximating quality as much as possible. One straightforward method 
to accomplish this is to use truncation as suggested in For each j = 1, ..., r, find the best 
rank-one approximation of 4?., « Cj bj where Cj £ C p and bj £ C m . However, it is often the case 
that the residues 4>j are not close to rank-one matrices and so, truncation to rank-one residues 
can substantially increase the LS error. EZj suggested using the SVD to determine the numerical 
ranks of the 4?jS and truncating them to their respective best low-rank (not necessarily rank-one) 
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approximations; EH also proposed Gauss-Newton correction, but no details on how to proceed in 
this direction were provided. 

We propose here two different approaches toward retaining rank-1 residues, allowing us to achieve 
a true McMillan degree of r while keeping the approximation quality as high as possible. The first 
approach, presented in d 5. H is based on a nonlinear least-squares minimization, the second one, 
presented in d 5.2 1 combines mimoVF with well-established optimal systems-theoretic model reduction 
methodologies. 


5.1. Rank-one residue correction via Alternating Least Squares. We seek an optimal 
rational approximant, H r (s), having the same poles, {A.,}!) as the mimoVF approximant, but taking 
the form 


H r (s) = 


CjbJ 


“ s - Xj 
j=i J 


= C(sI —A) 1 B t , where A = diag(Aj)f 1; 


C=(a 

B=(h 


c r ) 

br). 


(5.2) 


Optimality here will mean that C and B are chosen so that H r (s) satisfies 

f Cl b T i\ /H(£i)N 


mm 

C, B 




IE 


*=i j =i 


CjbJ 

& — A 


-H(&)|||- = min||(Sf®]g 

\c r bjJ V H ^)y 


ll- 


(5.3) 


A weighting factor pi > 0 can also be attached to each sample; yet for simplicity of presentation, 
we take all pi = 1. In (15. 3D . *€ £ C ixr denotes the Cauchy matrix = 1 /(^ — Ay). In many 
applications, H r (s) should be real-valued for real-valued s. In that case, a constraint is added that 
the poles A j and the residues CjbJ must be closed under conjugation: all non-real poles appear in 
complex conjugate pairs, say Xj , Aj+i = A,,-; Cj, bj are real if A j is real, otherwise c_,+i = cj, bj +1 = bj. 

The nonlinearity of the LS error (15.311 with respect to the variables, B and (7, can be evaded 
by reformulating the problem in terms of alternating least squares (ALS): if B (alternatively, C) 
is fixed, then m becomes a linear least squares problem in terms of C (alternatively, B). So, 
we minimize alternately with respect to C (holding B fixed) and then with respect to B (holding 
C fixed), repeating the cycle until convergence. We provide some algorithmic details below and 
illustrate the effectiveness of ALS iteration with an example. An analogous approach has been used 
for “residue correction” in realization-independent (data-driven) approaches to optimal TL -2 model 
reduction [?]• 

Correction of C. Assume B is fixed and seek an updated C (with conforming conjugation 
symmetry) that will minimize the LS error (15.31) . To that end, we vectorize the error matrix column¬ 
wise and write the fcth column (k = 1,..., m) of the residual matrix in (15.31) as 

/(6i) fc I \ /cA /H(£i)eA / Cl \ /H(£r)eA 

M I : - : =M(diag(R(fc,:))®I p ) : .(5.4) 

V (b r ) k lj \c r ) \m e )e k J \c r ) \n(&)e k J 

Stacking all columns together, the problem becomes: minimize the Euclidean norm of the residual 

/H(A)ei\ 


/ M{Ai ® I p )\ / Cl 


\M(A m ® Ip)/ 


H(^) ei 


H(A)e 


\H(^)e ro / 


, where A* = diag (B(i, :)), i = 1,..., m, (5.5) 
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with conjugation symmetry constraints: Cj is real if A j is real and Ck = cj, if A& = A j. Note that 
the number of rows above is £ ■ m • p, and the number of unknowns is p • r. In practice, t is much 
larger than m,p, r, and it is always assumed that t > 2 r. 

Let = Q(q) = QR be the QR factorization, where Q = (q Q = Q(:,l : r). Then 

M = (Q ®> I p )((q) ® Ip) is the QR factorization of M. Multiplying the blocks in the residual (15.51) 
by (Q* ® Ip) and using the block-partitioned structure of Q , we obtain an equivalent LS problem: 


({R® Ip)(A l( g)Ip)\ 

0 


(R ® Ip)(A m ® Ip) 

V o ) 



/(Q*®Ip)vec(S(:,l,:))\ 
(Q* 0 Ip)vec(§(:, 1,:)) 


( Q* ®Ip)vec(§(:,m,:)) 
\{Q* ®Ip)vec(§(:,m,:))/ 


(5.6) 


The blocks (Q*®Ip)vec(S(:, i ,:)), i = 1,..., m, in the right-hand side constitute a part of the residual 
that cannot be influenced with any choice of the CjS and the corresponding (£ — r) • p • m equations 
(with the corresponding zero rows in the coefficient matrix) are dropped, i.e., only the thin QR 
factorization = QR is needed. This reduces the row dimension of the problem from £ ■ p ■ m to 
r ■ p ■ m. Using the properties of the Kronecker product, we can further simplify it to 




(RAA 



/ cA 




® Ip 


: 



\RA m ) 



\C r J 


/ {Q* 0 Ip)vec(S(:, 1,:)) A 


\(Q* <8 Ip)vec(S(:, m,:))/ 


(5.7) 


To solve (15.71) we compute the QR factorizations 


Rb = U ( " ) , Rb ® Ip = {U ® Ip)( ( "T ) ® Ip), where R B = 


/MU 


e C” 


(5.8) 


\RA m J 


and, using the partition U = (u tj'j , we reduce the problem to solving the triangular system 



/cA 

/ (Q* ® Ip)vec(S(:, 1,:)) \ 


(■T ® Ip) 

: = (U*®Ip) 


(5.9) 


\c r ) 

\(Q* ® Ip)vec(§(:,m,:))/ 



Note that only the thin QR factorization R B = UT is needed. Folding the unknowns back into the 
structure of C we obtain, using that ( Q* ® I p )vec(§(:, i ,:)) = vec(S(:, i, :)Q* T ), 


vec(CT r ) = ( U* ® Ip) 


( vec(S(:,l, 

Q* T )\ 





= ( U * 0 Ip)vec(^ 


. S(:,m,:)Q* T )) 

\vec(S(:, m 

0 Q* T )J 





= vec((§(:,l,:)Q* T ... §(:, m, :)Q* T ) U* T )- 


(5.10) 


As an alternative to solving (15.91) . C can be computed efficiently as the solution of a triangular 
matrix equation. The formula C = (§(:, 1, :)Q* T ... S(:,m, :)Q* T ) U* t T~ t is rich in BLAS 3 

operations and can be highly optimized. Finally, we note that the QR factorizations involved can 
be done with pivoting, but we omit details for the sake of simplicity. 
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Fig. 5.1. (Example l5.ll ) Illustration of the truncation of mimoVF and the ALS correction. (Truncation applied 
to the mimoVF output shown on the right graph on Figure 13.41 Only one ALS iteration is used.) The relative TL 2 
error of H r is x ~ 1.50e — 01. 


Correction of B. If the matrix C is fixed and we want to update B , we use the preced¬ 
ing procedure, with a few simple modifications. First, transpose the residuals at each ^ to get 

J2' j= i . — H(£i) T . As a consequence, swap the roles of the CjS and the bjS , and use §(i,:,:) 
instead of §(:,*,:). The rest follows mutatis mutandis. 

A numerical example. We illustrate the usefulness of the ALS correction process in building 
a final approximant H r that has exact McMillan degree r. We use the data of Example 13.4. 11 and 
the output of mimoVF after the second iteration. The simple truncation of the residue matrices 
causes the LS error jump from 7 ss 6.45 • 10 -3 to 7 « 2.72, and one step of ALS correction reduces 
it down to 7 ~ 1.41 • 10 -2 . This improvement is evident in Figure [5Jl 

5.2. Rank-one residue correction via 'H 2 /U .00 model reduction approaches. The ALS 

iteration described above is built purely upon algebraic least squares error minimization. However, 
if the underlying context relates the rational approximants to dynamical systems, then it may 
be advantageous to perform this reduction using well-developed systems-theoretic reduction tools. 
Recasting our rank-one residue correction problem into this setting, we consider constructing an 
r-th order system H r that closely approximates the VF computed model H r in some appropriate 
system norm. 

The %2 norm discussed in 0 is the most natural choice and the first one we consider. This 
approach is compelling when weights and nodes in mimoVF are chosen using an appropriate quadra¬ 
ture as in HU and the algebraic LS measure is viewed as a discretized 'H 2 measure. In this case, the 
complete procedure; both mimoVF step and reduction to true McMillan degree-r will be performed 
with the "Hg system norm in mind. To achieve this goal, i.e., to minimize ||H r — H r ||-H 2 over all stable 
rth order H r , we will apply the optimal 'H 2 approximation method IRKA of |53j as modified in [7] for 
a realization independent procedure^ Note that the output of mimoVF, H,.(s), has the McMillan 
degree up to rmin(p, m); thus H r (s) can have a modest state-space dimension. If, for example, 
r = 80 and m = p = 50, H r can have degree as high as 4000. Thus, it is important to perform this 

6 It is not required that the underlying transfer function is rational. 
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second reduction step effectively. A particularly attractive aspect of the IRKA framework of [7] is 
that it needs only function and derivative evaluations at dynamically generated points. This works 
perfectly in our setting since the explicit state-space form of the mimoVF output in (15.11) makes these 
computations trivial. This approach can be viewed as a data driven implementation of the IRKA 
the measurements are fed into mimoVF to produce an intermediate model, a surrogate of the order 
r > r, based on measurements, and then this intermediate model is reduced by IRKA to its locally 
best rth order approximant. 

The T~ioo norm is another commonly used system norm. For a stable dynamical system with 
transfer function H(s), the TLoo norm is defined as HHH^ = sup^ eR ||H(bw)|| 2 - For details, we refer 
the reader to m- The commonly used approach to model reduction towards obtaining a small T~Lqo 
error measure is Balanced Truncation (BT) [371 S3- Even though BT requires solving two Lyapunov 
equations, once again particular state space realization in EH allows straightforward solution of 
the Lyapunov equations and makes the BT related computations cheap. Thus, we may also employ 
BT in reduction to true McMillan degree-r without much additional computational cost. 

5.3. An aggregate procedure: mimoFIT. Our overall approach to adapting VF to matrix¬ 
valued rational approximation consists first of the mimoVF process (described in detail in (E) and 
ED, followed by a post-processing step that performs the reduction to true McMillan degree-r with 
minimal loss of fidelity. This post processing stage can be performed either by the ALS correction 
of (15.11 or by systems-theoretic approaches such as IRKA or BT as described in ( 15.21 We will refer 
to this two-step process as mimoFIT. 

Numerical Examples. Here, we illustrate the performance of mimoFIT with four numerical 
examples. In each case, we investigate the effect of the methodology employed in the post-processing 
stage on the overall approximation quality. We also compare the final models produced by mimoFIT 
with the optimal-?^ approximations obtained by IRKA. 

5.3.1. Heat Model. We consider the Heat Model from the NICONET Benchmark collection 
m-, the model has m = 2 inputs and p = 2 outputs. We use only 20 function evaluations (£ = 40 
samples due to complex conjugacy) and obtain rational approximations of order r = 6 and r = 10 . 
Table 15.11 lists the resulting relative I-L 2 errors due to different approaches. The first row is the error 
due to the output of mimoVF. Note that this approximation has order r x m = 2r since it has 
full-rank residues. This is not our final approximation and is included here as a reference point. We 
obtain a true degree-r approximant using four different approaches: (i) simple rank -1 truncation of 
the residues by SVD, (ii) ALS correction of (15.11 (iii) IRKA on the degree-2r output of mimoVF to 
reduce it to degree-r (iv) BT on the degree-2r output of mimoVF to reduce it to degree-r. These 
four methods are labeled, respectively, as mimoFIT-(Trnct), mimoFIT-(ALS), mimoFIT-(IRKA), and 
mimoFIT-(BT). The first observation is that the simple rank-1 truncation of the residues by SVD 
leads to a high loss of accuracy compared to the ALS correction; this is most apparent in the r = 10 
case where mimoFIT-(Trnct) has one order of magnitude higher error than mimoFIT-(ALS). For both 
cases, mimoFIT-(IRKA) and mimoFIT-(BT) perform extremely well (especially mimoFIT-(IRKA)) and 
even with a true degree-r approximant, they almost match the accuracy of the degree-2r mimoVF 
approximant; i.e., reduction from 2r to r causes a negligible loss of accuracy. The last row indicates 
the relative H 2 error associated with the optimal approximant from IRKA. As expected, IRKA yields 
smaller error; we do not anticipate to beat the continuous optimal approximation via a discretized 
least-square measure. However, it is important to note that mimoFIT-(IRKA) and mimoFIT-(BT) 
with only 20 function evaluation yield results close to those obtained by IRKA; this is especially true 
for r = 10 . 

5.3.2. ISS-1R Module. We repeat the above studies for the ISS 1R module [32] with m = 3 
inputs and p = 3 outputs. We use 100 function evaluations (£ = 200 samples) and obtain rational 
approximations of order r = 20 and r = 30. Table [5771 depicts the resulting relative 7^2 error values 
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Method 

r = 6 

r = 10 

mimoVF (degree: 2r) 

^6530^10^“ 

1.0759 x 10” a 

mimoFIT-(Trnct) 

3.7022 x 10~ 2 

2.6137 x 10” 2 

mimoFIT-(ALS) 

1.8218 x 10~ 2 

2.8774 x 10“ a 

mimoFIT-(IRKA) 

1.7359 x 10~ 2 

1.1686 x 10" a 

mimoFIT-(BT) 

3.0604 x 10~ 2 

1.1591 x 10” a 

IRKA 

8.5566 x 10” a 

1.0925 x 10” a 


Table 5.1 

The relative TL 2 errors due to mimoVF, mimoFIT and IRKA. 20 function evaluations 


for the same methods used in the previous example in H5.3.1I As in the previous example, both 
mimoFIT-(IRKA) and mimoFIT-(BT) yield very accurate results and show negligible loss of accuracy 
in reduction from the intermediate 3r approximant to the final degree-r approximant; the order of 
the mimoVF approximant is reduced three-fold yet not much accuracy is lost. mimoFIT-(IRKA) and 
mimoFIT-(BT) again yield approximation errors close to that of IRKA. The main difference from the 
previous case is that in this case even the ALS correction suffers from the loss of accuracy as the 
simple truncation approach. 


Method 

r = 20 

r = 30 

mimoVF (degree: 3 r) 

4.6074 x 10” 2 

3.3226 x 10” 2 

mimoFIT-(Trnct) 

1.2904 x lO' 1 

1.2508 x lO' 1 

mimoFIT-(ALS) 

1.2558 x 10" 1 

1.2223 x 10 1 

mimoFIT-(IRKA) 

7.7305 x 10~ 2 

4.4757 x 10” 2 

mimoFIT-(BT) 

7.7457 x 10~ 2 

3.3483 x 10" 2 

IRKA 

6.7779 x 10~ 2 

1.1423 x 10” 2 


Table 5.2 

The relative 'H 2 errors due to mimoVF, mimoFIT and IRKA. 100 function evaluations 


5.3.3. ISS-12A Module. We now investigate the larger ISS 12A module [ 32 ] with m = 3 
inputs and p = 3 outputs. We focus on this problem since the underlying system of degree n = 
1412 is very hard to approximate with a lower order system; it presents significant challenges to 
model reduction, not necessarily from a computation perspective but from an approximation quality 
perspective. As illustrated in [ 32 ], the Hankel singular values decay rather slowly, so to obtain a 
reduced model with a relative error tolerance of 10 —3 , one needs a reduced model of order at least 
226 even using balanced truncation. We use 250 function evaluation, obtain rational approximations 
of order r = 80 using mimoFIT, and compare the result with the continuous optimal approximation 
IRKA. Table [5751 depicts the resulting relative %2 error values. As before, both mimoFIT-(IRKA) and 
mimoFIT-(BT) show negligible loss of accuracy in reduction from the intermediate 3 r approximant 
to the final degree-r approximant and have approximation errors close to that of IRKA. mimoFIT- 
(Trnct) and mimoFIT-(ALS) perform reasonably well as well in this case with mimoFIT-(Trnct) having 
the largest error among the four. 

5.3.4. A power system example with large input/output space. This example results 
from small-signal stability studies for large power systems. The specific model we consider here is 
one of the models from the Brazilian Interconnect Power System (BIPS); refer to 050 for details. 

7 This model can be downloaded from https://sites.google.com/site/rommes/software 
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Method 

r = 80 

mimoVF (degree: 3r) 

1.4678 x 10" 1 

mimoFIT-(Trnct) 

2.4549 x IO" 1 

mimoFIT-(ALS) 

2.2075 x 10” 1 

mimoFIT-(IRKA) 

1.5116 x 10" 1 

mimoFIT-(BT) 

1.5130 x 10” 1 

IRKA 

1.1317 x 10” 1 


Table 5.3 

The relative 'H .2 errors due to mimoVF, mimoFIT and IRKA. 250 function evaluations 



Fig. 5.2. (Example 15.3.41 ) Large-scale power syster 
Right plot: The sigma plots of the full and reduced model. 



io (rad/sec) 

example: Left plot: Decay of the Hankel singular values 


The underlying dynamical system has dimension n = 13250 with m = 46 inputs and p = 46 
outputs. We apply mimoFIT-(BT) to obtain our rational approximant. We use £ = 200 frequency 
samples. The mimoVF (first step of mimoFIT) is applied with r = 40. Due to m = p = 46, the output 
of mimoVF has an effective McMillan degree of r x m = 1840. Note that this is only marginally a 
reduced model, having a McMillan degree roughly 14% of the originally system order. The decay of 
the leading Hankel singular values of the intermediate model is shown in the upper plot of Figure 
15.21 Note the slow decay; even after the 300 th one, the normalized Hankel singular values are still 
above the threshold of 10 -4 . This system is difficult to reduce. Indeed, in earlier works, even for 
simpler versions of the model having a smaller number of input and outputs (such as m = p = 28), a 
reduced model of degree 291 was used; see m- We choose the final degree to be a point at which the 
normalized Hankel singular values have decayed below 2 x 10 -4 , leading to a final McMillan degree 
of 253 (around 2% of the original). The sigma plots, i.e., ||H(®w)||2 vs u £ 1, for the full model and 
the final mimoFIT approximant are shown in the lower plot of Figure [5721 As the figure illustrates, 
the mimoFIT approximant does an excellent job in approximating the underlying dynamics. 

6. Conclusions. The VF method, as originated by Gustavsen and Semiyen, has been and con¬ 
tinues to be an important tool for rational approximation and many authors have applied, modified, 
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and analyzed this approach. Although their method is based on successive solution of linear LS prob¬ 
lems, which are well understood problems in of themselves, we find that subtleties enter that can 
degrade both the performance and accuracy of current VF approaches, especially for matrix-valued 
rational approximation of modest dimension which often produce extremely poorly conditioned 
problems. These issues include 

1) balancing the potential conflict between rank revealing pivoting in QR factorizations used 
in LS solvers and column-scaling used to improve conditioning; 

2) avoiding redundant computation within the multiple subproblems solved as part of the large 
LS problem that arises; 

3) the need for rigorous termination criteria and the efficient recovery of the best possible 
rational approximant in case the iteration must be terminated prematurely; 

4) the use of regularized least squares and the discrepancy principle, both implemented using 
high accuracy linear algebra methods; and 

5) control of the McMillan degree of the resulting rational approximant. 

In this paper, we have considered these issues carefully, together with other more minor ones. We 
have integrated these developments into a robust, efficient implementation of VF for matrix-valued 
rational approximation, called mimoVF, that appears to be both faster and more accurate than 
currently available implementations. Further, we have connected the underlying discrete LS approx¬ 
imation problem to a continuous optimal Ti .2 approximation problem through numerical quadrature, 
which motivated a reformulation of the original VF objective as a weighted LS problem. For es¬ 
sentially the same cost as mimoVF, we were then able to use this reformulation to significantly 
improve the quality of the approximation. Finally, we have offered here an aggregate procedure, 
called mimoFIT, that combines mimoVF with % 2 /'H 00 -based approximation methods that yield high- 
fidelity rational approximants with low McMillan degree. 
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